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ABSTRACT The verrucomicrobial subdivision 2 class Spartobacteria is one of the most abundant bacterial lineages in soil and has 
recently also been found to be ubiquitous in aquatic environments. A 16S rRNA gene study from samples spanning the entire 
salinity range of the Baltic Sea indicated that, in the pelagic brackish water, a phylotype of the Spartobacteria is one of the domi- 
nating bacteria during summer. Phylogenetic analyses of related 16S rRNA genes indicate that a purely aquatic lineage within 
the Spartobacteria exists. Since no aquatic representative from the Spartobacteria has been cultured or sequenced, the metabolic 
capacity and ecological role of this lineage are yet unknown. In this study, we reconstructed the genome and metabolic potential 
of the abundant Baltic Sea Spartobacteria phylotype by metagenomics. Binning of genome fragments by nucleotide composition 
and a self-organizing map recovered the near-complete genome of the organism, the gene content of which suggests an aerobic 
heterotrophic metabolism. Notably, we found 23 glycoside hydrolases that likely allow the use of a variety of carbohydrates, like 
cellulose, mannan, xylan, chitin, and starch, as carbon sources. In addition, a complete pathway for sulfate utilization was found, 
indicating catabolic processing of sulfated polysaccharides, commonly found in aquatic phytoplankton. The high frequency of 
glycoside hydrolase genes implies an important role of this organism in the aquatic carbon cycle. Spatiotemporal data of the 
phylotype's distribution within the Baltic Sea indicate a connection to Cyanobacteria that may be the main source of the polysac- 
charide substrates. 

IMPORTANCE The ecosystem roles of many phylogenetic lineages are not yet well understood. One such lineage is the class Spar- 
tobacteria within the Verrucomicrobia that, despite being abundant in soil and aquatic systems, is relatively poorly studied. Here 
we circumvented the difficulties of growing aquatic Verrucomicrobia by applying shotgun metagenomic sequencing on a water 
sample from the Baltic Sea. By using a method based on sequence signatures, we were able to in silico isolate genome fragments 
belonging to a phylotype of the Spartobacteria. The genome, which represents the first aquatic representative of this clade, en- 
codes a diversity of glycoside hydrolases that likely allow degradation of various complex carbohydrates. Since the phylotype 
cooccurs with Cyanobacteria, these may be the primary producers of the carbohydrate substrates. The phylotype, which is highly 
abundant in the Baltic Sea during summer, may thus play an important role in the carbon cycle of this ecosystem. 
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Representatives of the bacterial phylum Verrucomicrobia are mor- 
phologically diverse and are present in various terrestrial and 
aquatic habitats, including oligotrophic, eutrophic, extreme, pol- 
luted, and manmade ones (1). The few existing isolates have been 
collected from diverse ecological niches, including soil, freshwa- 
ter, marine habitats, and feces, displaying aerobic, facultative an- 
aerobic, or obligate anaerobic heterotrophic lifestyles. These Ver- 
rucomicrobia cultivars have the capacity to utilize various carbon 
compounds, such as plant polymers, for example, cellulose, xylan, 
and pectin (2, 3), and sugars or methane (4, 5). In addition, Ver- 
rucomicrobia have been found to be involved in nitrogen fixation 
in termites (6) and soil (7). 

A recent 16S rRNA study of 181 soils, sampled across different 
soil types and continents, found Verrucomicrobia to be highly 



abundant, averaging 23% of the 16S rRNA gene sequences per 
sample (8). This by far exceeded previous estimates, which was 
attributed to primer mismatches in commonly used 16S rRNA 
primers (8). In most of these samples, the Verrucomicrobia were 
dominated (92% in total) by sequences belonging to subdivision 2 
class Spartobacteria (8). This class comprises one of the primary 
lineages in the phylum Verrucomicrobia (9) but has currently only 
one cultivated representative, Chthoniobacter flavus, an aerobic 
heterotrophic bacterium isolated from pasture soil that is able to 
grow on carbohydrate components of plant biomass (10). Al- 
though the majority of the Spartobacteria appear to be soil inhab- 
iting (9), they have also been detected as endosymbionts of nem- 
atode worms ("Xiphinematobacter" [11]) and in aquatic 
environments (e.g., see references 12 to 15). In a global investiga- 
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tion of the distribution and diversity of marine Verrucomicrobia, 
Spartobacteria were negatively correlated with salinity and were 
the dominant group in most areas with low salinities (16). The 
authors speculated that aquatic Verrucomicrobia play a significant 
role in aquatic ecosystems. Although it is well known that bacteria 
are the main consumers of organic matter in the aquatic environ- 
ment, the specific roles of individual organisms are not well un- 
derstood (17). Key enzymes for degradation of polysaccharides 
derived from plant/algae biomass are glycoside hydrolases (GHs) 
containing single or multiple catalytic modules frequently at- 
tached to one or more accessory noncatalytic carbohydrate bind- 
ing modules (CBMs) (18, 19). This class of enzymes is well repre- 
sented in soil-isolated Verrucomicrobia, including Chthoniobacter 
flavus (20). 

We recently performed a 16S rRNA gene sequence analysis of 
213 samples collected in summer that spanned the entire salinity 
range of the Baltic Sea (21). This revealed pronounced shifts in the 
bacterial community at different phylogenetic levels along the sa- 
linity gradient. The pyrosequencing reads in the brackish water 
environment (between salinities 5 and 8) were dominated (> 10% 
of reads in many samples) by an operational taxonomic unit 
(OTU) belonging to the Spartobacteria. Despite elaborate at- 
tempts, no aquatic Spartobacteria have been isolated (e.g., see ref- 
erence 22) and no genome has been sequenced so far. To get a 
better understanding of the physiology and ecological role of this 
organism, and of aquatic Spartobacteria in general, we applied 
shotgun metagenomics to a sample with high abundance of the 
Spartobacteria OTU, resulting in the first reconstruction of an 
aquatic Spartobacteria genome. The metagenome analysis re- 
vealed a rich repertoire of glycoside hydrolases, and the spatiotem- 
poral distribution of the OTU suggests a connection to 
phytoplankton-derived polysaccharides. 

RESULTS AND DISCUSSION 

Metagenome of "Spartobacteria baltica" bin. Metagenomic se- 
quencing of the surface water sample yielded 37,658,923 bp of 454 
pyrosequencing data that was assembled into 58,176 contigs. In 
order to isolate contigs belonging to the target genome, we used an 
emerging self-organizing map (ESOM) approach that clusters ge- 
nome fragments into phylogenetic groups based on tetranucle- 
otide frequency distributions (23, 24). Since our previous 16S 
rRNA gene sequencing indicated that Verrucomicrobia are highly 
dominated by a single operational taxonomic unit (OTU) in this 
sample (21), the risk for extensive coclustering of other verruco- 
microbial genomes was considered low. The resulting ESOM con- 
tained a distinct region highly enriched in contigs with best 
BLAST matches to Verrucomicrobia (Fig. 1). The region was sep- 
arated by a ridge, indicating large differences in tetranucleotide 
frequencies, from surrounding areas containing Actinobacteria, 
Bacteroidetes, Proteobacteria, and Cyanobacteria and a heteroge- 
neous area containing a mixture of clades. The contigs within the 
region were assigned to a "Spartobacteria baltica" bin. 

The "Spartobacteria baltica" metagenome bin contained 334 
contigs with an average length of 5. 4 kb (7. OX mean coverage) and 
a total of 1,81 1,214 bp (Table 1). The contigs had a relatively high 
GC content (62%), which is in the range of currently sequenced 
Verrucomicrobia genomes (25), with the exception of the metha- 
notrophic thermophile "Candidatus Methylacidiphilum inferno- 
rum." The "Spartobacteria baltica" bin carries 2,226 predicted 
genes, of which 2,189 (98%) encode proteins. Only one contig 




Verrucomicrobia Actinobacteria Cyanobacteria 
■ Proteobacteria ■ Bacteroidetes 

FIG 1 Emerging self-organizing map of the metagenome contigs. Pixels are 
colored according the taxonomic annotation of the contig(s) that occupies the 
pixel. Background color represents the distance in data space between the 
pixels in the neighborhood; hence the white ridges represent borders between 
regions of highly dissimilar tetranucleotide frequency distributions. 



with a 16S rRNA gene was found. Since the sequence coverage of 
the 1 6S rRNA gene (9 X ) was not significantly above the average of 
the "Spartobacteria baltica" bin (7X ), the genome is likely to en- 
code a single ribosomal RNA operon. The V3-V4 region of the 16S 
rRNA gene was identical to the sequence of the Spartobacteria 
OTU that we previously detected in high abundance in this sample 
(21). The 16S rRNA gene also had high similarity to cloned 16S 
sequences (97 to 99% identity) obtained earlier in the central Bal- 
tic Sea (22). Thirty-three genes encoding tRNAs for 17 standard 
amino acids were found (see Table SI in the supplemental mate- 
rial). Of the protein-coding genes, 1,533 (69%) were functionally 
predicted, 621 (28%) were assigned to KEGG maps, 1,404 (63%) 
were assigned to clusters of orthologous groups (COGs), and 
1,443 (65%) were assigned to specific domains in the Pfam data- 
base. In comparison with Chthoniobacter flavus, "Spartobacteria 
baltica" has a high overall functional assignment of the predicted 
proteins (Table 1). 

To assess the completeness and purity of the "Spartobacteria 
baltica" bin, we used a set of 40 housekeeping genes known to 
normally occur in single copies (see Table S2 in the supplemental 
material) (26, 27). Thirty-eight of the 40 genes were found, dis- 
tributed over 17 contigs. Three were found in multiple copies 
(duplicates), but two of these duplicates were found adjacent on 
the contig, indicating that sequencing errors had resulted in split 
genes. Placing the above-described 17 contigs in a reference phy- 
logenetic tree based on their housekeeping genes (26) shows that 
all contigs are inferred to belong to the Verrucomicrobia (Fig. SI). 
Given that we recapture all but two of the housekeeping genes and 
that these, with few exceptions, occur in single copies, the "Spar- 
tobacteria baltica" bin likely represents a substantial fraction of a 
single genome (28), although some fragments are likely missing 
due to incomplete coverage (see Materials and Methods). The 
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TABLE 1 Comparison of metagenome/genome properties from the "Spartobacteria baltica" bin and Chthoniobacter flavus (56), the closest relative 
that has been genome sequenced 



Property 


"Spartobacteria baltica" 


Chthoniobacter flavus 


No. of genes 


2,226 


6,778 


No. of bases 


1,811,214 


7,848,700 


No. of coding bases 


1,657,652 


6,925,094 


GC (%) 


62 


61 


No. of DNA scaffolds 


334 


62 


No. of RNAs 


37 


62 


No. of rRNAs 


3 


4 


5S 


1 


2 


16S 


1 


1 


23S 


1 


1 


No. of tRNAs 


34 


58 


No. of genes with function prediction (%) 


1,533 (69) 


3,584 (53) 


No. of genes with Pfam assignment (%) 


1,443 (65) 


4,074 (60) 


No. of genes with COG assignment (%) 


1,541 (69) 


3,658 (54) 


Total no. of COG IDs a 


964 


1,426 


No. of shared COG IDs b 


831 


831 


No. of unique COG IDs b 


133 


595 


No. of contigs 


334 




Avg coverage ( X ) 


7.0 






904,350/6,763 




N 75 IL 75 


1,358,441/3,933 





" IDs, identifications. 



b Comparing Chthoniobacter flavus and "Spartobacteria baltica." 



binning approach may also fail to group contigs having markedly 
different tetranucleotide compositions, such as recently acquired 
genome fragments of distant phylogenetic origin (29). Redoing 
the binning after spiking the metagenome with artificial contigs of 
the closest sequenced relative (Chthoniobacter flavus Ellin428) 
did, however, generate a cohesive cluster for this genome (Fig. S2), 
suggesting that the method should be accurate also for "Sparto- 
bacteria baltica" in this metagenomic context. 

An advantage with metagenomics compared to isolate se- 
quencing is that it gives direct insight into population heteroge- 
neity (e.g., see references 30 and 31). Manual inspection of pat- 
terns of single nucleotide polymorphisms on aligned reads in a 
random selection of "Spartobacteria baltica" contigs suggested 
that the metagenome bin represents a population of closely related 
strains (similar enough to coassemble) that undergo recombina- 
tion (see Fig. S3 in the supplemental material for an example). 
Deeper genomic coverage is needed to assess the population struc- 
ture in detail. 

Phylogenetic analysis of "Spartobacteria baltica." A phyloge- 
netic analysis of a concatenation of 31 of the single-copy genes 
confirmed our previous 16S rRNA-based placement of "Sparto- 
bacteria baltica" within the class Spartobacteria (21) in the super- 
phylum Planctomycetes, Verrucomicrobia, and Chlamydia (PVC) 
(Fig. 2). The class Spartobacteria currently comprises one validly 
described order (Chthoniobacterales) and family (Chthoniobacte- 
riaceae) (9). The pasture soil isolate Chthoniobacter flavus Ellin428 
(10, 20) is the closest cultured and sequenced relative (Fig. 2) but 
is still phylogenetically distant from "Spartobacteria baltica" (0.48 
branch length distance in Fig. 2). Interestingly, C. flavus has a 
nearly three-times-larger genome (Table 1), which corroborates 
an earlier estimate of genome sizes based on metagenome data, 
which indicated considerably larger bacterial genomes in soil than 
in marine environments (32). Based on 16S rRNA gene "Sparto- 
bacteria baltica" phylogeny of the Silva 111 SSU Ref NR tree (33), 



"Spartobacteria baltica" belongs to the lineage "LD29." Detailed 
phylogenetic analysis of 16S rRNA genes from this lineage shows 
that solely environmental sequences derived from brackish, fresh- 
water, and wastewater environments are found in this lineage 
(Fig. 3; see also Fig. S4 in the supplemental material). The most 
similar (99% identity) 16S rRNA sequences are from a freshwater 
lake in the Netherlands (GenBank accession number AF009975) 
and short sequences from the Baltic Sea (GenBank accession 
number EF627955). The lineage "LD29" was among the first 16S 
rRNA sequences of the Spartobacteria found by Zwart et al. (15). 



— "Spartobacteria baltica" 

Chthoniobacter flavus 
— Verrucomicrobium spinosum 
Akkermansia muciniphila 




c 
o 
o 

Pedosphera parvula 

"Methylacidiphilum infernorum" 
Diplosphaera colitermitum 

Opitutus terrae 

Coraliomargarita akajimensis 
"Verrucomibium DG1235" 

Chlamydiae 

Planctomycetes 

\Actinobacteria/ 
Spirochaetes 

FIG 2 Maximum likelihood tree based on a concatenated alignment of 3 1 
conserved genes of "Spartobacteria baltica" and representative genomes of 
Verrucomicrobia (10 representatives), Chlamydiae (5 representatives), Planc- 
tomycetes (8 representatives), and Spirochaetes/Actinobacteria (21 representa- 
tives). The tree was rooted using the Spirochaetes/Actinobacteria as an out- 
group. All groups except the Verrucomicrobia have been grouped into wedges 
for clarity. Dots indicate bootstrap values of >98%. 
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r HM127782, Lake Qinghai, 1488 ~ 
LHM127735, Lake Qinghai, 1486 
AF009975, Lake Loosdrecht, 1482 
"Spartobacteria baltica", 1443 

— HM129028, Lake Kelike, 1467 

- HM129403, Lake Kelike, 1502 
HM129425, Lake Kelike, 1484 

. FJ612208, Lake Dongping, 1641 
*FJ612390, Lake Dongping, 1455 
1 — JN869104, LakeTaihu, 1247 
JN868841, LakeTaihu, 1554 
r HM238169, gas biofilter, 1487 
JN541192, activated sludge, 1529 
Chthoniobacteriaceae 



0.11 



I — GQ402801, forest soil, 1483 
— DQ450786, meadow soil, 1418 



L GQ402642, forest soil, 1481 

Xiphinematobacteriaceae" 



FIG 3 Phylogenetic tree of nonredundant sequences of > 1,200 bp in the 
Spartobacteria class obtained from the Silva database 111 SSU Ref NR. "Can- 
didatus Methylacidiphilum infernorum" was used to root the tree. The tree 
was calculated using the RAxML algorithm with rapid bootstrap analysis 
( 1 ,000 bootstraps) . Only nodes supported by high bootstrap values are marked 
(filled circles, >95%). The origins of the sequences are indicated by the acces- 
sion number, the isolation source, and the length of the sequence in bp. 



Therefore, we named the first genomically characterized phylo- 
type of this lineage "Spartobacteria baltica." 

Metabolism of "Spartobacteria baltica." A reconstruction of 
the energy metabolism by manual annotation of the metagenome 
bin revealed that "Spartobacteria baltica" uses a set of pathways 
typical of many aerobic heterotrophic organisms (see Table S3 in 
the supplemental material). Glucose can be converted to glucose 
6-phosphate and degraded to pyruvate via the typical Embden- 
Meyerhof pathway (EMP). Pyruvate is further oxidized to acetyl 
coenzyme A (acetyl-CoA) that is used in the tricarboxylic acid 
cycle (TCA). The presence of fructose 1,6-bisphosphatase indi- 
cates the possibility for gluconeogenesis via the EMP, and the 
presence of genes coding for 2-oxoglutarate dehydrogenase, suc- 
cinate dehydrogenase, and succinyl-CoA synthetase indicates a 
complete tricarboxylic acid cycle. The products of the TCA cycle 
and Embden-Meyerhof pathway are precursors of several amino 
acids. The pathways for the formation of L-alanine, L-valine, 
L-leucine L-isoleucine, L-serine, and L-glycine, starting with inter- 
mediates of the EMP, are fully represented by the corresponding 
genes (Table S3). Also, biosynthetic pathways for the formation of 
L-aspartate, L-glutamate, L-glutamine, L-proline, L-threonine, 
L-lysine, and L-histidine from precursors of the TCA cycle were 
found. The biosynthetic pathways for L-arginine, L-methionine, 
and L-cysteine are not complete. Although we found the genes for 
a complete pentose phosphate pathway, which is involved in the 
regeneration of NADPH but also generates precursors for 
L-tryptophane, L-phenylalanine, and L-tyrosine biosynthesis, a 
few genes for a complete pathway of these three amino acids are 
missing. However, these more-complex pathways may miss single 
enzymes due to incomplete genome coverage. The same also ac- 
counts for the biosynthesis of purine and pyrimidine nucleotides 



and the genes coding for lipopolysaccharides and peptidoglycan 
biosynthesis (Table S3). However, although Verrucomicrobia are 
described to have a Gram-negative staining cell wall, the class 
Opitutae has been reported to lack peptidoglycan (34), suggesting 
that Spartobacteria may also miss the corresponding genes. 

Although the organism seems to be capable of biosynthesis of 
amino acids, essential prerequisites for the use of N 2 , N0 3 ~ or 
N0 2 ~ for the generation of nitrogen precursors were not found. 
While their absence may reflect incomplete genomics coverage, 
many ABC-type transporters involved in spermidine/putrescine 
(potABCD), peptide (oppABCDF), and branched-chain amino 
acid (livKHMGF) uptake may indicate the uptake of organic ni- 
trogen and recycling of the acquired ammonia groups were found 
in the genome (see Table S3 in the supplemental material). More- 
over, chitin — a putative substrate of "Spartobacteria baltica" — 
can support the nitrogen requirements of the organism (see be- 
low) (35). The metagenome "Spartobacteria baltica" has a 
complete pstABSC transporter system putatively involved in the 
uptake of phosphate and ABC transporters for iron (fliuDBC) and 
zinc (znuABC). 

The sulfur metabolism is almost complete in the "Sparto- 
bacteria baltica" metagenome. Genes involved in the reduc- 
tion of sulfate to hydrogen sulfide (via adenylylsulfate, 3'- 
phosphoadenylylsulfate, and sulfite), which is a precursor for the 
biosynthesis of L-cysteine by a cysteine synthase, were predicted. 
The sulfur metabolism plays an important role in the degradation 
of phytoplankton-derived polysaccharides since sulfated polysac- 
charides are frequently found in algae and Cyanobacteria (36). The 
metagenome bin also contains genes for a sulfate permease that 
may facilitate uptake of sulfate. 

Polysaccharide-degrading enzymes. The aerobic hetero- 
trophic metabolism described above requires a carbon source for 
the generation of energy and as a substrate for anabolism. Inter- 
estingly, "Spartobacteria baltica" contains several genes encoding 
glycoside hydrolases (GHs), key enzymes to degrade polysaccha- 
ride compounds. In total, 23 GHs representing 13 different GH 
families, as defined in the carbohydrate-active enzyme database 
(CAZy), were detected (18), suggesting the use of several different 
substrates, like cellulose, mannan, xylan, chitin, and starch (Table 
2). 

(i) Genes relevant for cellulose degradation. Three identified 
GH-encoding genes may be relevant for cellulose degradation; 
two are GH5 members, and one belongs to the family GH9. One of 
the predicted GH5 proteins has a single GH5 catalytic module, 
whereas the other GH5 member is supplemented with additional 
modules, including a family 6 carbohydrate binding module 
(CBM6) (Fig. 4A). The first GH5 protein sequence cannot be as- 
signed to any subfamily, although it is distantly related to subfam- 
ilies GH5_7 and GH5_41 (Fig. 4B). The closest relatives to this 
sequence are GH5 enzymes from Stackebrandtia nassauensis, a cel- 
lulolytic member of the Actinobacteria (37), and Lentisphaera ara- 
neosa, an exopolymer-producing bacterium (38). The second 
modular GH5 member can be assigned to the recently described 
subfamily GH5_46 (39) (Fig. 4C), a poorly biochemically charac- 
terized GH5 subfamily. Carboxymethyl cellulose (CMC) activity 
has been described for a GH5_46 subfamily isolated from cow 
rumen (40), which currently is the only characterized enzyme in 
this subfamily (see Fig. S5 in the supplemental material). More- 
over, the appended CBM6 module is known to bind to various 
j8-glycans (41). Notably, the genome of Chthoniobacter flavus El- 
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TABLE 2 Comparison of CAZyme distributions in "Spartobacteria baltica" and the aquatic subdivision 1 Vermcomicrobia "Verrucomicrobium 



AAA168-F10" (46) 


CAZy family 


No. of "Spartobacteria 
baltica" 


No. of "Verrucomicrobium AAA168 
F10" 


GH1 


4 


2 


GH2 


3 




GH3 


3 


5 


GH5 


2 


3 


GH9 


1 


3 


GH10 


1 


3 


GH13 




8 


GH16 


1 


1 


GH17 


1 


1 


GH18 


2 




GH26 




1 


GH30 


2 




GH31 




1 


GH43 


1 


3 


GH57 


1 




GH77 




1 


GH78 




2 


GH81 




5 


GH109 




19 


GH119 


1 





lin42 also contains two genes coding for GH5 proteins. However, 
these two, and a second GH5 representative from the "Verruco- 
microbium AAA168-F10" genome (42), do not cluster together 
with the "Spartobacteria baltica" genes in the phylogenetic analy- 
sis, and none of them can currently be assigned to any GH5 sub- 
family (data not shown). 

(ii) Genes relevant for chitin degradation. In addition, the 
"Spartobacteria baltica" metagenome bin also reveals two genes 
encoding candidate chitinolytic proteins belonging to the family 
GH18. One of the GH18 proteins contains two CBM2 modules 
and a CBM33 module. Another identified CBM33 module is in- 
dependent, i.e., not connected to any catalytic module. Interest- 
ingly, members of CBM33 were recently shown to have enzymatic 
activity on insoluble substrates like chitin and cellulose (43, 44). 
Via a mechanism involving hydrolysis and oxidation, CBM33 en- 
zymes boost degradation of chitin and cellulose by making crys- 
talline polysaccharide regions accessible to enzymatic cleavage of 
GHs. The gene products from the three identified GH3 genes may 
harbor the chitobiase activity required for complete hydrolysis of 
chitin. A bacterial GH3 protein with N-acetylhexosaminidase ac- 
tivity has previously been reported to have a function in the chitin 
utilization system (45). 

(iii) Other polysaccharide-degrading enzymes. The genes en- 
coding members of the GH families GH1, GH2, GH3, GH10, 
GH30, and GH43 represent candidates for the hydrolysis of non- 
cellulosic poly- and oligosaccharides and the side branches of 
hemicelluloses and pectins. For instance, endo- l,4-/3-xylanase ac- 
tivity has been described in the families GH10, GH30, and GH43. 
However, the "Spartobacteria baltica" GH30 sequences cannot be 
classified into any of the defined GH30 subfamilies, and the top 
BLAST hit for the "Spartobacteria baltica" GH43 protein is a 
j3-xylosidase (GenBank accession number ACE82692) from Cell- 
vibrio japonicus. Since almost all characterized enzymes in GH10 
are endo-l,4-jS-xylanases, it is plausible that the sequence as- 
signed to GH10 in our study can exhibit this activity. The identi- 
fied GH16 and GH17 sequences are most likely involved in deg- 



radation of laminarin, a j3-l,3-glucan found mainly in brown 
algae (46). Of note, we discovered a family GH119 gene in the 
"Spartobacteria baltica" bin. Currently, GH1 19 contains only six 
members in the CAZy database, and the only biochemically char- 
acterized enzyme is an a-amylase (47). 

Representatives from Vermcomicrobia have been shown to de- 
grade polysaccharides in soil (subdivision 4 [2], subdivision 2 
[10], and termite subdivision 4 [6] ). Recently, Martinez-Garcia et 
al. (42) were able to identify coastal and freshwater Vermcomicro- 
bia as polysaccharide-degraders using fluorescently labeled lami- 
narin and xylan in combination with single-cell genomics. The 
coastal "Verrucomicrobium AAA168-F10" from the family Ver- 
rucomicrobiaceae (subdivision 1) contains 58 glycoside hydrolases 
putatively involved in the degradation of mucopolysaccharides, 
glycoproteins, peptidoglycan, celluloses, hemicelluloses, and gly- 
cogen. One of the three GH5 sequences identified in AAA168-F10 
also falls within subfamily GH5_46 and, although truncated, 
shows high similarity to the GH5_46 member of "Spartobacteria 
baltica" (see Fig. S5 in the supplemental material). 

Ecological role of "Spartobacteria baltica." Utilization of 
polysaccharides by bacteria has been demonstrated in aquatic en- 
vironments (17), but the identity and specific roles of the mi- 
crobes performing this process are still elusive (48). "Spartobac- 
teria baltica" has the genetic potential to use a variety of 
polysaccharides as carbon, nitrogen, and sulfur sources. In the 
marine environment, phytoplankton is a major source of such 
substrates, and a multitude of hydrolytic enzymes and sulfatases 
have been shown to be expressed during the decay of a phyto- 
plankton bloom in associated bacteria (49). Previous studies re- 
ported a link between the dynamics of phytoplankton biomass 
and Spartobacteria in freshwater lakes (50, 5 1); moreover, Arnds et 
al. (12) found Spartobacteria cells attached to filamentous algae in 
a humic freshwater lake. 

In the central Baltic Sea, pronounced phytoplankton blooms 
occur seasonally, with spring blooms being dominated by eukary- 
otic phytoplankton and summer blooms being dominated by Cy- 



May/June2013 Volume 4 Issue 3 e00569-12 



Bio' mbio.asm.org 5 



Herlemann et al. 



GH5 



CBM6 Fascin FN3 DUF1349 




GH5_1 

GH5_4 

Stackebrandtia nassauensis 

Spartobacteria baltica" 

— Lentisphaera araneosa 



GH5 41 



GH5 7 




■ Ignavibacterium album 
— "Spartobacteria baltica" 

I — Pedobacter saltans 
' — Solitalea canadensis 

- Fibrella aestuarina 

- Niastella koreensis 
Chitinophaga plnensls 

- Niastella koreensis 
Cellvibrlo japonicus 

- Brevundimonas sp. 

Agrobacterlum radlobacter 

Chryseobacterium gleum 
Elizabethkingla anopheles 
Polaribacter irgensii 
GH5_46 

GH5 1 



FIG 4 Analyses of "Spartobacteria baltica" GH5 sequences. (A) Modular 
structure of the GH5 protein sequence (gene id 21 19805716 in Table S3). (B) A 
maximum likelihood tree of selected bacterial GH5 catalytic module se- 
quences, including "Spartobacteria baltica" (gene id 21 19806690 in Table S3). 
(C) A maximum likelihood tree of the subfamily GH5_46, including "Sparto- 
bacteria baltica" (gene id 2119805716 in Table S3), and selected bacterial se- 
quences from related GH5 subfamilies. The phylogenetic analysis was re- 
stricted to the catalytic module. 



anobacteria (52). In a previous study, the seasonal dynamics of 
surface water microbial communities in the central Baltic Sea (at 
the Landsort Deep) was investigated by 454 sequencing of ampli- 
cons of the V6 region of 16S rRNA genes (53). One of the most 
abundant OTUs in this data set is identical to the V6 region of the 
"Spartobacteria baltica" 16S rRNA gene. In the temporal study, 
the OTU displayed pronounced seasonal dynamics and peaked in 
July (with 5% of the reads) (see Fig. S6 in the supplemental mate- 
rial). This coincided roughly with blooms of filamentous Cy ano- 



bacteria (22), but the limited numbers of samples (« = 8) do not 
allow meaningful statistics. Instead, we used the 1 6S data from 2 1 3 
samples of the Baltic Sea transect study (21) to search for spatial 
correlations between the "Spartobacteria baltica" OTU and other 
OTUs. Interestingly, the most highly correlated OTU was a pico- 
cyanobacterium (identical to Synechococcus/Cyanobium se- 
quences from freshwater [54] and from the Baltic Sea [55], dis- 
playing a Spearman rank abundance correlation of 0.80 [P value of 
<10 -16 ]) (Fig. S7). Hence, the spatial data indicate a connection 
to picocyanobacteria, but it should be noted that filamentous Cy- 
anobacteria were not accurately quantified in the study, and cor- 
relations to these may therefore have been missed. Moreover, the 
genomic findings indicate that substrates may additionally origi- 
nate from eukaryotic phytoplankton, such as chitin. Besides crus- 
taceans and copepods, phytoplankton blooms of Thalassiosira and 
Skeletomena are considered important sources of chitin since they 
produce chitin strands to increase their buoyancy (56). These spe- 
cies are highly abundant during spring phytoplankton blooms in 
the Baltic Sea (57) and may therefore provide the substrate during 
this period. 

In summary, we have performed genomic analysis of the first 
aquatic representative of the Spartobacteria, one of the most abun- 
dant heterotrophic bacteria in the brackish Baltic Sea and other 
aquatic environments. The genome reveals a rich repertoire of 
polysaccharide-degrading genes, and the spatiotemporal data in- 
dicate ecological connections to phytoplankton. Further studies 
investigating seasonality and local distribution of microorganisms 
in the Baltic Sea will give more details on the interaction between 
aquatic Spartobacteria and phytoplankton; moreover, the enzy- 
matic characterization of the glycoside hydrolases can give insight 
into their mode of action and substrate specificity. 

MATERIALS AND METHODS 

Sampling, DNA preparation, and sequencing. The water sample was 
obtained on a research cruise (MSM0803) of the RV Maria S. Merian in 
June and July 2008 at 59°47.88'N, 24°46.75'E (see Herlemann et al. [21] 
for details). Water samples for DNA analysis were filtered (0.22-p.m-pore- 
size white polycarbonate filters), and DNA was extracted according to 
Weinbauer et al. (58). The sample was sequenced at the Swedish Institute 
for Communicable Disease Control using 454 pyrosequencing (Roche) 
and a protocol for library preparation that allows minute amounts of 
sample DNA (59). 

Metagenome assembly, binning, and annotation. 454 pyrosequenc- 
ing reads were assembled using the Newbler assembler (Roche) with de- 
fault parameter settings except that the "large" flag was used. Contigs with 
a size of >2 kb were subjected to phylogenetic binning by an emerging 
self-organizing map using the ESOM analyzer (60) based on tetranucle- 
otide frequency distributions of contigs (23). The same parameter settings 
and initial data normalization as those used in Dick et al. (23) were ap- 
plied, but a 50- by 80-pixel grid was used. Projecting contigs in the size 
range of 1 to 2 kb on the ESOM map that had already been generated with 
the longer contigs resulted in an additional 168 contigs (231,753 bp in 
total) falling in the "Spartobacteria baltica" region, indicating that a frac- 
tion of the genome was missing among the >2-kb contigs. However, since 
the approach is unreliable for contigs <2 kb (23, 24), to minimize the risk 
for assigning external contigs to the genome, we restricted the analysis to 
contigs with a size of >2 kb. For making the spiked metagenome, the draft 
genome of Chthoniobacter flavus Ellin428 was downloaded from NCBI 
and split into 5-kb-long "contigs" and added to the metagenome. When 
running the ESOM analyzer on this, an 80- by 110-pixel grid was used, 
with other settings as described above. A Perl program for generating 
input to the ESOM analyzer can be downloaded at https://github.com 
/tetramerFreqs/Binning. For coloring the contigs in the map according to 
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(probable) phylum affiliation, contigs were BLASTx (61 ) searched against 
the NCBI nr database, and based on the BLAST output, MEGAN (62) was 
used to extract phylum-level annotations. 

All contig sequences were annotated with the IMG/M metagenome 
analysis pipeline (see Table S3 in the supplemental material) (63). Auto- 
matic annotations with functional predictions were also improved man- 
ually with the annotation platform provided by Integrated Microbial Ge- 
nomes (64). Metabolic pathways were reconstructed using MetaCyc (65) 
as a reference data set. Detailed information about the automatic genome 
annotation can be obtained from the JGI IMG website (http://img.jgi.doe 
.gov/w/doc/about_index.html). 

Construction of the 16S rRNA gene tree. The metagenome revealed 
the complete 16S rRNA gene which was used for phylogenetic analysis. 
The phylogenetic 16S rRNA tree was constructed using the ARB program 
suite (66). All 16S rRNA spartobacterial sequences available in the Silva 
release 111 NR (33) were downloaded from the Silva browser (total of 631 
sequences), the full-length sequence of "Spartobacteria baltica" was 
added, and "Candidatus Methylacidiphilum infernorum" was used as an 
outgroup. A core tree was estimated from 1,012 unambiguously aligned 
sequence positions of all nearly full-length (>1,200 bp) sequences (633 
sequences), using maximum-likelihood analysis (RAxML) with rapid 
bootstrapping (1,000 replicates) and the GTRMIXI rate distribution 
model provided in the ARB package (Fig. 3). A total of 435 short se- 
quences (>300 bp), positionally filtered by base frequency (50%), were 
added without changing the global tree topology by using the ARB parsi- 
mony tool (data not shown). Based on these results, a phylogenetic tree 
containing all sequences of >300 bp from the "LD29" lineage, including 
Chthoniobacter flavus as a reference and "Xiphinematobacteraceae" as an 
outgroup, was extracted (total of 168 sequences) (see Fig. S4 in the sup- 
plemental material). Phylogenetic trees were graphically processed using 
Fig tree (http://tree.bio.ed.ac.uk/software/figtree/). 

Glycoside hydrolase identification and annotation. The domain 
structures of automatically annotated glycoside hydrolases were manually 
curated using SMART (67), Pfam (68), and the Conserved Domain Da- 
tabase (69). Glycoside hydrolase family annotations were revised by com- 
parison to the carbohydrate-active enzymes database (http://www.cazy 
.org) (18). For the phylogenetic analysis, sequences of GH5 catalytic 
domains were aligned using MUSCLE (70), and the phylogenetic trees 
were generated using PhyML (71). Bootstrap support was calculated using 
100 replicates. Subfamilies GH5_1 and GH5_4 were used as outgroups in 
the phylogenetic analysis. 

Phylogenomic analysis. A phylogeny was estimated using a set of 3 1 
conserved single-copy phylogenetic marker protein sequences, down- 
loaded as HMMER3 HMM models (http://hmmer.janelia.org) from 
Pfam 26.0 (68) (PF00163, PF00203, PF00281, PF00347, PF00416, 
PF00828, PF03118, PF11987, PF00164, PF00237, PF00318, PF00366, 
PF00572, PF01000, PF03588, PF13393, PF00181, PF00238, PF0O333, 
PF00410, PF00573, PF01193, PF03947, PF13603, PF00189, PF00252, 
PF00344, PF00411, PF00750, PF02403, PF10458). "Spartobacteria bal- 
tica" contigs were six-frame translated and searched with Pfam hmm pro- 
files, as were the protein sequence complements of reference genomes. 
Marker proteins were identified in "Spartobacteria baltica" bin contigs 
and 44 microbial reference genomes based on the selection in the 2009 
GEBA tree (72). The "Spartobacteria baltica" bin marker proteins were 
identified in twelve different contigs after six-frame translation. The se- 
quences were aligned with Probcons (73) and analyzed with Zorro (74). 
Positions with a Zorro score of S6 were selected, and individual align- 
ments were concatenated, producing an alignment with 7,597 well- 
aligned sites. A maximum likelihood tree was calculated with RAxML 
7.2.8 using the LG substitution matrix (75) and a gamma model of rate 
heterogeneity (PROTGAMMALGF). 

Nucleotide sequence accession number. The complete metagenome 
(all sequence reads) of the sample has been deposited in the European 
Nucleotide Archive under accession number ERP002583. 
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